We propose to model the acoustic space of deep neural network (DNN)class-conditional posterior probabilities as a union of low-dimensionalsubspaces. To that end, the training posteriors are used for dictionarylearning and sparse coding. Sparse representation of the test posteriors usingthis dictionary enables projection to the space of training data. Relying onthe fact that the intrinsic dimensions of the posterior subspaces are indeedvery small and the matrix of all posteriors belonging to a class has a very lowrank, we demonstrate how low-dimensional structures enable further enhancementof the posteriors and rectify the spurious errors due to mismatch conditions.The enhanced acoustic modeling method leads to improvements in continuousspeech recognition task using hybrid DNN-HMM (hidden Markov model) framework inboth clean and noisy conditions, where upto 15.4% relative reduction in worderror rate (WER) is achieved.
展开▼